Learning Domain-Specific Discourse Rules for Information Extraction

نویسندگان

  • Stephen Soderland
  • Wendy Lehnert
چکیده

This paper describes a system that learns discourse rules for domaln-speclfic analysis of unrestricted text. The goal of discourse analysis in this context is to transform locally identified references to relevant information in the text into a coherent representation of the entire text. This involves a complex series of decidons about merging coreferential objects, filtering out irrelevant information, inferring missing information, and identifying logical relations between domain objects. The Wrap-Up discourse analyzer induces a set of classifiers from a tra]n|ng corpus to handle these discourse decisions. Wrap-Up is fully tr~nable, and not only determ|nes what classifiers are needed based on domain output specifications, but automatically selects the features needed by each classifier. Wrap-Up’s classifiers blend linguistic knowledge with real world domain knowledge.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AAAI 1995 Spring Symposium on Empirical Methods in Discourse Interpretation and Generation Learning Domain-Speci c Discourse Rules for Information Extraction

This paper describes a system that learns discourse rules for domain-speci c analysis of unrestricted text. The goal of discourse analysis in this context is to transform locally identi ed references to relevant information in the text into a coherent representation of the entire text. This involves a complex series of decisions about merging coreferential objects, ltering out irrelevant inform...

متن کامل

Recognising Discourse Causality Triggers in the Biomedical Domain

Current domain-specific information extraction systems represent an important resource for biomedical researchers, who need to process vast amounts of knowledge in a short time. Automatic discourse causality recognition can further reduce their workload by suggesting possible causal connections and aiding in the curation of pathway models. We describe here an approach to the automatic identific...

متن کامل

Relational Learning of Pattern-Match Rules for Information Extraction

Information extraction is a form of shallow text processing which locates a specified set of relevant items in natural language documents. Such systems can be useful, but require domain-specific knowledge and rules, and are time-consuming and difficult to build by hand, making infomation extraction a good testbed for the application of machine learning techniques to natural language processing....

متن کامل

Corpus-Driven Knowledge Acquisition for Discourse Analysis

The availability of large on-line text corpora provides a natural and promising bridge between the worlds of natural language processing (NLP) and machine learning (ML). In recent years, the NLP community has been aggressively investigating statistical techniques to drive part-of-speech taggers, but application-specific text corpora can be used to drive knowledge acquisition at much higher leve...

متن کامل

Ontology-driven discourse analysis for information extraction

This paper presents a novel approach to discourse analysis within information extraction systems. It makes use of DRT as formal representation of the linguistic context as well as of a domain-specific ontology as a basis to compute conceptual relations between extracted events thus establishing discourse coherence. The approach has been implemented within GenIE, an information extraction system...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002